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Due to increasing research being conducted at ICASE in the field of fluid mechanics, 
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and numerical mathematics reports will have the familiar blue cover, while computer science 
reports will have yellow covers. In all other aspects the reports will remain the same; in 
particular, they will continue to be submitted to the appropriate journals or conferences for 
formal publication. 
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ABSTRACT 

Although the advent of fast and inexpensive parallel computers has ren- 
dered numerous previously intractable calculations feasible, many numerical 
simulations remain too resource-intensive to be directly inserted in engineer- 
ing optimization efforts. An attractive alternative to direct insertion considers 
models for computational systems: the expensive simulation is evoked only to 
construct and validate a simplified input-output model; this simplified input- 
output model then serves as a simulation surrogate in subsequent engineering 
optimization studies. We present here a simple “Bayesian-validated” statisti- 
cal framework for the construction, validation, and purposive application of 
static computer simulation surrogates. As an example, we consider dissipation- 
transport optimization of laminar-flow eddy-promoter heat exchangers: paral- 
lel spectral element Navier-Stokes calculations serve to construct and validate 
surrogates for the flowrate and Nusselt number; these surrogates then repre- 
sent the originating Navier-Stokes equations in the ensuing design process. 


'This work was supported by DARPA Grant N00014-91-J-1889; ONR Grant N00014- 
90-J-4124; ONR Grant N00014-89-J-1610; and by the National Aeronautics and Space Ad- 
ministration under NASA Contract No. NAS1-19480 while the author was in Engineering 
(ICASE), NASA Langley Research Center, Hampton, VA 23681. 


PRECEDING PAGE BLANK NOT FILMED 




1 Introduction 


Large-scale numerical calculation, such as fluid flow simulation, is an increas- 
ingly significant component of engineering and scientific analysis. However, 
despite recent advances in both algorithms and architectures, many relevant 
individual calculations still require many hours of expensive supercomputer 
time. As a result, direct insertion of these resource— intensive simulations as 
“subroutine” calls in particular design and optimization studies is typically 
not viable. First, the number of objective-function evaluations required to 
find an optimal, or even reasonable, solution to an optimization problem will 
not be known a priori. Direct insertion of expensive simulations may ex- 
haust allocated resources before interesting — or even feasible — solutions 
are obtained. Second, effective engineering design and optimization processes 
are evolutionary, with goals and constraints continually modified to reflect 
newly available information or specifications. Direct insertion of large-scale 
calculations strongly inhibits adaptability: with each revision of objectives, 
previous computations must be discarded, and a new sequence of expensive 
simulations must be initiated. Third, the value of expensive numerical simu- 
lations can be greatly enhanced by proper incorporation of prior information 
derived from collateral analytical, experimental, or heuristic investigations. 
Direct insertion of simulation results renders model fusion and validation dif- 
ficult. Fourth, most design and optimization exercises are multidisciplinary 
in nature [1], involving numerous relatively distinct fields of physical inquiry 
(e.g., fluid mechanics, solid mechanics, physical chemistry). Direct insertion 
of diverse simulations affords little opportunity to accomodate — or exploit 
— differing degrees of complexity and sensitivity. In summary, if large-scale 
computation is to graduate from analysis to synthesis , new paradigms are 
required. 

One attractive solution to the simulation-integration impasse considers 
models for computational systems: the expensive, large-scale simulation, de- 
noted Ad°, is evoked only to construct and validate a simplified compu- 
tational model, denoted this simplified model, Ai, then serves as an 

inexpensive surrogate for in subsequent engineering applications. The 

simplified model M. can be evaluated effectively ad infinitum , can support 
a large class of objective functions, can readily accomodate extra-simulation 
information, and can be easily incorporated into multidisciplinary design 
studies. The application of models for computational systems does, however, 
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raise new questions, in particular as regards purposiveness: to what extent is 
the design procedure misdirected, or proposed designs mispredicted, by the 
introduction of approximate simulation surrogates? 

In this paper we develop — and apply — a complete surrogate frame- 
work for optimization based on the simple validation concepts presented in 
[2]. More broadly, the work is founded upon several related streams of in- 
quiry. From system identification (control) theory [3-6] we borrow the notion 
of algorithmic logical empiricism, in which available data is systematically 
incorporated into the model construction and validation processes; from the 
design of experiments [7] we appreciate the need for sampling heuristics and 
response surfaces; from statistical prediction rules and artificial neural net- 
works [8-11] we adopt the concept of “construct and validate” — or “train 
and test” — data partitions; from the theory of machine learning [12,13] we 
appropriate the “probably approximately correct” framework; from Monte 
Carlo methods [14] and the classical equivalence of measure and probability 

[15] we derive our sampling procedures; from nonparametric statistical theory 

[16] we deduce our statistical error estimates; from scattered-data method- 
ology [17] we derive our model-construction procedures; and from statistical 
quality-control theory [18,19] we adapt relevant a posteriori reliability con- 
cepts. Lastly, our work, in philosophy, is most closely aligned to earlier 
seminal efforts in statistical simulation surrogates [20-23], in which, first, 
the need for surrogates is motivated, second, the special role of statistical 
statements is recognized, and third, the idiosyncrasies of (largely determinis- 
tic) computer experiments are identified; other “non-surrogate” statistically 
motivated approaches to the incorporation of expensive simulations into op- 
timization studies [24] are also relevant to our study. 

The paper is organized as follows. In Section 2, we describe the op- 
timization framework in which surrogates will ultimately be applied, dis- 
cuss the general class of subproblems for which our surrogate methods are 
most appropriate, and introduce the particular laminar-flow eddy-promoter 
heat exchanger optimization study that will serve as our detailed illustra- 
tion. Finally, we reiterate the motivation for the surrogate approach, for- 
mally define the surrogate problem, and describe the broad methodolog- 
ical guidelines that effective surrogate procedures should honor. In Sec- 
tion 3, we present our modelling methodology, treating both validation and 
construction-validation; the algorithms and error estimates are described, 
and results for the eddy-promoter heat exchanger are presented. In Section 
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4, we consider the incorporation of surrogate techniques into the full opti- 
mization framework, with particular focus on purposiveness and a posteriori 
analysis; the surrogate-based optimization approach is illustrated for the 
eddy-promoter heat exchanger problem. In Section 5, we consider several 
extensions to the surrogate methodology: classification maps; databoards, 
and multiple-output estimates. Lastly, in Section 6, we briefly state our con- 
clusions. (For clarity and self-containedness, we include here some material 
already discussed in [2], in particular as regards the validation procedure; 
however, the optimization framework is new, as is the treatment of a “real” 
application, that is, a problem which truly requires a surrogate approach.) 


2 General Problem Statement 

2.1 Optimization Framework 

In this paper we develop simulation surrogate techniques designed to func- 
tion as part of a larger optimization study: we therefore require a general 
optimization framework in which to interpret our results. We emphasize that 
this paper is not concerned with more classical optimization issues such as 
optimality conditions and mathematical programming techniques [25]; we 
assume that our optimization problems are well posed, and that procedures 
exist to find at least local, and preferrably global, optima. 

We first introduce a bounded, lower semi— continuous objective function, 

$(£;i):fixA -H, (1) 

where £ is the optimization design A/-vector, ft C JR M is the admissible 
(closed) domain for p, or “design space,” A is the optimization definition N- 
vector, and A C iFvis the admissible domain for X. The A vector comprises 
coefficients which, as regards the optimization process, may be treated as 
parameters. Our minimization problem is thus: Find $min(A)i£ (A) such 
that 

$min(A) = *(E*(A); A) < *(E? A) V E 6 ft , (2) 

where $mi„(A) and £*(A) are the minimum and minimizer, respectively. We 
explicitly introduce the dependence of the minimum and minimizer on A t0 
underscore that our optimization problem is, in fact, a family of optimization 
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problems parametrized by A. We define $nun(A) 35 a global minimum to 
emphasize that our surrogate techniques are intended to serve not only final, 
local optimization studies, but also initial, exploratory design efforts. 

We are interested in a particular class of optimization problems in which 
the objective function can be written in terms of a subproblem. 


$(£;A) = 4>(j£(£);£;A) . 


More explicitly, we can think of evaluating $( £ ; A) as 


E 


s n flE) — 


( 0 ) 


K 


s e 


>K 


( 3 ) 


$(£;A) = <£(i;£;A) , 

where p is the subproblem input M-vector, fl is the subproblem input do- 
main, s E 2R A is the subproblem output vector, £(p) is the subproblem 
input-output function, and L°°(Cl) is the space of bounded measurable func- 
tions over the domain fl. (We believe that most of our results require only 
that £( £ ) be in L°°( fl) A ; however, all mathematics in this paper must be 
considered purely formal pending complete hypotheses and serious proofs.) 

We shall further assume that the deterministic subproblem input-output 
function, <£( £ ) : 1R M — » 1R K is expressed as a functional, applied to a field 
U(x,f; £ ), 

£(£) = I(U(-, •;£);£) , (4) 

where U(x, <; £ ) satisfies the initial-boundary-value field subproblem , 


M p — = A £ (U) (5) 

U(x,f = 0; £ ) = U°(x; £ ). (6) 

Here J_ : X x fl — ► IR h is the output functional; X is the function space in 
which the field subproblem solution U(x, f; £ ) resides; x and t are space and 
time, respectively; M p and are deterministic spatial differential operators 
(and associated boundary conditions) parametrized by the input vector £ ; 
and U°(x; £ ) is the initial condition on U(x, t‘,p). 

We give here a very simple illustration from incompressible fluid dynamics 
intended to render the abstract subproblem framework more comprehensible. 


4 



To wit, we consider the drag coefficient for flow past a cylinder, in which 
we identify: p as a single input (M = 1), the Reynolds number; ft, the 
input domain, as the Reynolds-number interval of interest; s as a single 
output ( K = 1), the drag coefficient; j£(g) as the drag coefficient-Reynolds 
number relationship; J. as the time-averaged streamwise component of the 
integral of the stress-normal product over the cylinder surface; U(x, t\ g) as 
the {velocity, pressure} pair; M p and A p as the incompressible Navier-Stokes 
system, in which the Reynolds number enters as a parameter. (Note that, for 
reasons described in Section 2.2, we will typically not be interested in either 
time or space as an input: all temporal and spatial dependence is eliminated 
by either by evaluating the subproblem field at a particular point, by 
averaging over time or space, or by considering properties of asymptotic, 
steady, or stationary solutions. For this simple example, we perform temporal 
and spatial averages of temporally stationary solutions.) 

We make four final remarks. First, we have equated the input variables 
and the design variables (and hence the input space and the design space); 
more generally, the input variables may comprise only a subset of the design 
variables, g, but may also include certain definition variables, A- The former 
would be fortunate, reducing the size of the subproblem; the latter would be 
unfortunate, reducing the flexibility of a single surrogate to readily address 
several different optimization problems. Second, we presume that the field 
subproblem must be solved numerically, but that, for the purposes of this 
paper, all numerical errors are sufficiently small that we may equate the nu- 
merical and exact solutions. Third, as shall be discussed in Section 2.2, we 
shall be particularly interested in subproblems for which j2(g) is computa- 
tionally expensive to evaluate (that is, the field subproblem is difficult); we 
shall assume, however, that, once £ = £(g) is known, the objective function 
A) can be inexpensively evaluated as <£(s;g; A). Fourth, for the optimiza- 
tion problems we consider, the computational complexity — and the greatest 
opportunity for improved efficiency — resides not in the search process, but 
in the objective function evaluation. 

2.2 Class of Subproblems of Interest 

The surrogate approach is particularly appropriate for optimization problems 
in which the subproblem satisfes the following three “complexity conditions.” 
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Condition Cl: The <$(£) are expensive to evaluate in terms of computer costs, 
elapsed time, or human effort. This condition may appear to be transitory, 
given the continual and rapid decrease in computational times and costs [26]. 
We claim, however, that as computational capacity increases, problem size 
will also grow to accomodate: increased (perhaps finally adequate) resolu- 
tion; higher fidelity mathematical models; increased physical complexity of 
new technologies. □ 

Condition C2: Sharp regularity information on the £(g), such as a Lipschitz 
condition, 

Ij£(p 2 ) -£(£,)! < -£j| 

is difficult to obtain and problem-specific. This implies, in effect, that very 
little regularity can be assumed of the input-output function £(p).D 

Condition C3: Knowledge of £(£) at one input value, g , is of minimal com- 
putational value in evaluation of £(£) at a second input value, £ 2 : subproblem 
evaluation enjoys no computational economy of scales. This condition pre- 
cludes certain {subproblem; input; diagnostic} triples, such as { • ; time; • }, 
and {steady Navier-Stokes; Reynolds number; Newton continuation}. In 
both these cases — assuming sufficient regularity — later calculations can 
exploit earlier results in order to reduce computational effort. (As we are 
not considering time as an input, our surrogates are “static;” this does not 
imply, of course, that the field subproblem involves only steady phenomena, 
nor that the outputs may not contribute to a time-dependent model. )□ 

In colloquial terms, for a subproblem which satisfies these three conditions: 
we can not afford to generate subproblem solutions for many different input 
values (Cl,C3); we can not consider the subproblem solutions at a few input 
values to be representative of the entire input space (C2); we can not sim- 
plify subproblem evaluations by exploiting special features (such as locality) 
of the parent optimization problem (C2,C3). We believe that, unfortunately, 
there are many problems which approximately satisfy these conditions. 

Remark on Physical Experiments. It is of interest to ask why our meth- 
ods are not as appropriate for experimental investigations as for computa- 
tional systems, and, conversely, why experimental data analysis techniques 
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are not directly appropriate for computational systems. Considering the 
former, first, quite apart from noise, physical (in particular continuum me- 
chanics) experiments often do not satisfy condition C3; for example, once 
an expensive flow apparatus is configured, there are great economies of scale 
in (in fact, opportunity costs incurred in not) obtaining data for a large 
number of flowrates, rather than just a few. This is because much experi- 
mental equipment is problem-specific, amortized over only a particular class 
of inquiries, and because elapsed time is not a serious consideration in many 
(though not all) laboratory environments. Second, despite recent advances in 
transducers and imaging, experimental data at a particular input value, £ , is 
already greatly reduced with respect to analogous numerical data; whereas 
raw simulation data resident on a databoard (see [2] and Section 5) can 
be subsequently processed to produce a wide range of different outputs, ex- 
perimental data can serve only those applications requiring the few outputs 
selected in the originating investigation. 

Turning now to why well- developed experimental data analysis techniques 
are not directly applicable to simulations, first, many experimental inquiries 
assume significant noise levels. In contrast, although computational inquiries 
do contain difficult-to-quantify factors (e.g., resolution, incomplete iteration) 
that may perhaps be gainfully interpreted as noise, these factors tend to be 
both relatively small and largely controllable. Many experiment-design tech- 
niques developed to reduce or understand uncontrolled factors (e.g., blocking 
and randomization [7]) are thus largely irrelevant in the computational arena 
[21]. Second, many experimental surrogate (e.g., response surface) methods 
are premised upon assumptions of both smoothness and locality (e.g., linear 
models, fractional factorial designs [7)); although these assumptions may be 
necessary for noisy experiments, deterministic simulations can benefit from 
less restrictive hypotheses. □ 

2.3 Eddy— Promoter Heat Exchanger Example 

2.3.1 The Optimization Problem 

As our physical problem we consider two-dimensional laminar flow and con- 
vective heat transfer in the eddy-promoter heat exchanger shown in Figure 
1. In overview, the eddy-promoter channel comprises a two-dimensional 
(infinite in x 3 ) plane channel with plate separation (in x 2 ) 2 h, geometrically 
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interrupted by an infinite periodic array of insulating cylindrical eddy pro- 
moters of bottom wall-to-cylinder spacing a, radius R, and pitch (cylinder- 
to-cylinder -separation) L. Heat enters the channel through an isother- 
mal bottom wall maintained at temperature T\ (representing, for exam- 
ple, a highly conducting plate housing electronic components) and leaves 
the channel through an isothermal “cold” top wall maintained at tempera- 
ture T 0 . A fluid flow driven by an imposed pressure gradient, dp/dx\k\ — 
and excited to significant cross-stream transport by the eddy-promoters — 
serves to reduce the bottom wall-to-top wall thermal resistance. We are 
interested in determining that eddy-promoter placement and radius which 
minimizes pumping power (e.g., operating cost) and eddy promoter volume 
(e.g., materials cost) while simultaneously maintaining a temporally and spa- 
tially averaged bottom-wall heat flux (e.g., electronic component density), 
< F >1 not much less than < F > nom . In more quantitative terms, we 
wish to: minimize fox the pumping power + fox R?+ fox & penalty if 
{< F > nom < F >)/ < F >„ om > 0; with respect to eddy-promoter 
placement a and radius /?; for various objective-function weights fo , fo and 
fo and thermal loads < F > nom . 

(Notational aside: dimensional variables shall carry carats; length, veloc- 
ity, and time wi ll be nondimensionalized with respect to h, dp/dx x h 2 /2pv, 
and 2pv/dp/dxih, respectively; temperature will be measured relative to T 0 
and nondimensionalized with respect to A7 1 = 7\ — Tq. The incompressible 
working fluid is characterized by a constant density, p, kinematic viscosity, v , 
thermal conductivity, k, and thermal diffusivity, a. The domain associated 
with one periodicity length of the channel (arbitrarily positioned as shown 
in Figure 1) will be denoted Z), with the bottom and top walls denoted B\ 
and B q , respectively, and the eddy-promoter surface denoted Be- A generic 
point in D is (zi,z 2 ); the fluid velocity, pressure (perturbation from the im- 
posed linear field), and temperature are written as u = rijei + tt 2 e 2 , p', and 
T , respectively, where (ei,e 2 ) are the unit vectors associated with the two 
coordinate directions. Angular brackets, < • >, refer to time average with 
respect to a temporally stationary state.) _ 

We can pose the (nondimensional) eddy-promoter heat exchanger opti- 
mization problem as an instantiation of the general framework of Section 
2.1: <b,<f> I— ► $ EP ,(^ EP ; M i— ► M tp = (5) 2, the number of design variables; 
£ t ► £ ep = {(Rej,(Pr),a,R,(Z)}; ^ fi EP = {.1 < a <1,-05 < R < 

a — .05} (see Figure 2); N i — * N EP = 4, the number of definition variables; 
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^ M i !P = {fa, fa, fa,*}', A *-* A EP = 2R+. Here Re and Pr denote the 
Reynolds number and Prandtl number, defined as Re = —dp/dxh 3 /2pu 2 and 
Pr = i>/d, respectively, and k =< F > nom 2h/kAT is the nondimensional 
thermal load. Note that for the purposes of our optimization problem the 
Reynolds number, Prandtl number, and eddy-promoter pitch are fixed at 
Re = 300, Pr = 1, and L = 6.666, respectively, leaving only two design 
variables, the nondimensional eddy promoter placement, a, and radius, R\ 
we have indicated parenthetically that a more general optimization problem 
might involve five design variables, in which Re, Pr, and L are also free to 
vary. 

As our objective function we take 

* EP (a, R\ A EP ) = <t>”{G{a, R), Q(a, R); R, (Re)-, A EP ) (7) 

where 

q; R, (Re)-, fa , fa, fa, «) = fa(Re) 2 g + faR 2 + fa H{\ - 9 /«) . (8) 

Here g = Q(a, R) is the time-averaged flowrate through the channel, 

g(a,R) = ±J D <u l >dx, (9) 

and q = Q{a, R) is the time-averaged Nusselt number, 

Q{a,R)=< F >2h/k&T . (10) 

From < F >= i /b, < ~k§£ > dfa, it follows that 

Q{a ' R) = lb < ~fal >dXx ' (U) 

It is readily shown that, as required, [Re) 2 g is the pumping power per peri- 
odicity length per unit depth ^nondimensionalized by bpifijh 3 ), and 1 — q/n 
is (< F > nom - < F >)/ < F > nom . The penalty function H{z) is chosen 
to be H{z) = H(z)z 2 , where H(z ) is the Heaviside distribution. It is crit- 
ical to note that, even in a real (not just illustrative) design exercise, the 
particular, initial, form for the objective function is not overly important, 
as surrogate techniques are designed to permit significant variation in the 
objective function at low marginal cost. Indeed, it is often only through ob- 
serving optima and varying the objective function that design-goal intuition 
is clearly articulated. 
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2.3.2 The Eddy-Promoter Subproblem 

We next identify our eddy-promoter subproblem with the general subprob- 
lem described in Section 2.1: A/ i — ► A/ EP = (5) 2, the number of inputs; £ 
£ ep 5 {(Re),{Pr),a,R,{L)}-, fi h* Q ep = {.1 < a < 1, .05 < R < a - .05}; 
K h+ R' EP = 2, the number of outputs; §_ *-* £ EP = {</,?}; £(£) »-* j£ ep (£) = 

{£(£), C(£)}; 1 •-» {jx Jd < > dx, £ / Sl < > dxi}-, U •-» {u,p', T); 

M p , A p >— ► Navier-Stokes and forced convection. More explicitly, the Navier- 
Stokes field subproblem for (u(xj,X 2 ,<),p , (x 1 ,x 2 ,<)) is given by, 


dui du{ 

~m +u ‘W, 


duj 

dxi 


u 

(tt*p )(^i "1" mf/,x 2 ,f) 


, l A 3 ** < _2_r 

3x, Redxjdij Re ,l 


in D 


( 12 ) 


0 in D 

0 on B 0 U B\ U Be 
(u,p')(x 1 ,x 2 ,t) Vm € Z , 


(13) 

(14) 

(15) 


and the forced-convection energy equation for T(x 2 ,x 2 , t) is given by, 


dT 


dT 


dt U} dx 3 
T 

dT 

dn 

T(x i + 


1 


&t 


in D 


RePr dijdij 
0(1) on Bo(^i) 

0 on Be 

T[xi,x 2 ,t) Vm € Z 


(16) 

(17) 

(18) 

(19) 


Here is the Kronecker-delta symbol, dj dn denotes differentiation in the 
direction of the boundary normal, Z is the set of integers, Re = 300 and Pr = 
1, free indices range over {1,2}, and summation (£*) over repeated indices is 
assumed. Initial conditions are not important since, at this Reynolds number, 
only a single attractor appears to exist for all {a,R} £ fi EP . 

We solve the Navier-Stokes and energy equations with the NEKTON code 
on the Intel iPSC/860 multiprocessor. The numerical method comprises: 
fractional timestepping schemes [26-28]; spectral element spatial discretiza- 
tions [29]; and parallel [26,30] deflated [31] multilevel conjugate gradient [28] 
iterative solution procedures. A typical calculation proceeds by: specification 
of {a,i?}; automatic spectral element mesh generation from several skeletal 
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templates; automatic parallel partition; integration in time until a temporally 
stationary state is achieved; evaluation of the requisite output functionals. 
The spectral element templates are constructed to permit relatively undis- 
torted meshes and adequate resolution even for extreme cylinder placements 
and sizes. We have confirmed the mesh independence of our calculations 
[32], and have verified our results, where possible, with other numerical pre- 
dictions and experiment [33-37]. For a typical {a,R} € fl EP , roughly $i5 
and 6 16-processor hours are needed to reach the steady or steady-periodic 
state required to evaluate the flowrate and Nusselt number; note that each 
subproblem evaluation would cost as much as $750 on a sin^/e-processor 
supercomputer [26]. 

2.3.3 Flow Phenomena 

We aim to find $^ n (A) and £ EP *(A EP ) a KU"), «*(A EP )} such that, V{a, A} 
€ n EP ,^„(A EP ) = $ EP (a*(A EP ),-R*(A EP );A EP ) < <fr EP (a, .ft; A EP )- We empha- 
size that this design problem is not local: the triangular admissible design 
space, fl EP , shown in Figure 2, admits virtually all geometrically possible 
cylinder placements and radii. More importantly, different flow phenomena 
occur in different regions of the design space. Figure 3 depicts the isotherms 
for three representative {a,R} points in fl EP ; for a « .5 unsteady steady- 
periodic supercritical Tollmien-Schlichting wall-mode channel waves obtain 
(Figure 3a); for large R steady wavy flows predominate (Figure 3b); for very 
large R the flow is effectively blocked (Figure 3c) [32]. (Recall that the 
Navier— Stokes calculation is at fixed pressure gradient, not fixed flowrate.) 

This paper is not concerned with the details of eddy-promoter flows ex- 
cept to the extent that these details illustrate essential aspects of the surro- 
gate procedure. Readers interested in more details on dissipation- transport 
optimization of eddy-promoter systems are referred to [32,34-36]. These 
studies treat more realistic boundary conditions (e.g., in which the heat is 
carried away by the fluid flow) and optimization objective functions, and ad- 
dress a wider range of both laminar flows (numerically and experimentally ) 
and transitional and turbulent flows (experimentally). However, extensive 
optimization with respect to geometric inputs (e.g., a and R) has not been 
undertaken, due to the expense (in this case, in both the numerical and 
experimental contexts) associated with system modification and re-analysis. 
Readers interested in more details on the flow physics , in particular the hydro- 
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dynamic stability, of eddy-promoter (and related) flows may consult [32,35- 
38]; these papers interpret eddy-promoter bifurcations as the interaction of 
si mpler-geomet r y shear, cylinder, and channel instabilities. Recent quiet ex- 
periments and inflow-outflow numerical simulations [37] indicate that the 
initial instability is convective, not absolute; however, noisier experiments 
[36] agree quite well with periodic calculations, suggesting that in engineer- 
ing applications the assumption of spatial periodicity may be acceptable for 
sufficiently long channels. 

We claim the eddy-promoter subproblem satisfies the three conditions of 
Section 2.2: the computation is expensive and time-consuming (Cl); bifur- 
cations preclude sharp regularity estimates (C2); complex time-dependence 
and geometry variation precludes continuation methods (C3). 

2.4 The Surrogate Approach 

In the “direct insertion” approach to simulation-based optimization, the ob- 
jective function $(£, A) is evaluated, for each candidate £, as <j>(s; £; A), where 

Mp,An 7 

U(x, <; £ ) -±>seR h . 

s v— ■■ < 

2(p) 

The disadvantages of this approach are described in the Introduction: the 
number of evaluations of $(£, A) is not known a priori — resources may be 
exhausted before a sensible design is proposed; simulations evoked in a first 
optimization study with objective function $(p, A 1 ) will be of limited use in a 
second optimization study with objective function $(p, A 2 ) — design adapt- 
ability is frustrated; and systematic fusion of simulation results with prior 
analytical, heuristic, or experimental information is, at best, difficult. In 
short, it is difficult to perceive of a day-long thousand-dollar Navier-Stokes 
simulation as a function call from a mathematical programming routine. 

In the surrogate alternative, the subproblem simulation is evoked only 
to construct and validate a simplified input-output model, <5(p) : fl — + , 

which is intended to approximate j£(£) over the input domain Cl. This sim- 
plified input-output model then serves as a simulation surrogate in subse- 
quent optimization studies, that is, $(£, A) = <£(<£(£);£; A) is replaced with 

0GS(p);p; A) = $(PiA): $(p, A), not $(p, A), is minimized. Given the as- 
sumptions of Section 2.1, $(£,A) can be inexpensively evaluated for any 
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candidate £ as ^(|;£; A), where 

£ s n ^ i € R « . 

The surrogate problem can thus be stated as follows: Given a limited (or 
even fixed) number of appeals (recall conditions Cl and C3) to a largely 
uncharacterized (recall condition C2) but deterministic function, £(g); Find 
a simple but validated approximation to £(£) over H, £(£), which a) con- 
servatively but effectively exploits prior information, and b) can be gainfully 
incorporated into design and optimization studies. 

The advantages of surrogates are manifold: surrogates are, by construc- 
tion, inexpensive, and can be evaluated ad infinitum — premature termina- 
tion of the optimization procedure will not be required; a single surrogate 
can support a large A,-family of related objective functions — adaptive, non- 
incremental modification of design criteria and specifications is encouraged; 
surrogates can readily incorporate prior extra-numerical information con- 
cerning not only regularity, but also form — thereby reducing the computa- 
tional burden. The primary disadvantage of surrogates is the introduction of 
new errors into the optimization process due to the additional level of approx- 
imation, <H£(£);g; A) « 4(&£);£; A); this “purposiveness” issue is discussed 
in depth in Section 4. The many advantages (and significant disadvantage) 
of surrogates have long been recognized: computational scientists typically 
search for “insight not numbers”; engineers often exploit reduced-order mod- 
els. However, with the exception of relatively recent work [20-23]: the sur- 
rogate concept is rarely explicitly articulated; application of the surrogate 
concept is typically ad hoc\ and surrogates are not usually accompanied by 
useful error estimates. It is the latter shortcomings that we aim to partially 
mitigate. 

The surrogate problem statement and the complexity conditions Cl, C2, 
and C3 suggest several broad methodological guidelines. First, from condi- 
tions Cl and C3, we require error estimates for a fixed number of function 
evaluations; this implicates a statistical approach, in which uncertainty can 
be precisely accomodated. Second, from condition C2, we must presume that, 
in the general case, we know relatively little about our input-output func- 
tion; this implicates a nonparametric statistical approach. Third, from con- 
ditions Cl and C2, design sensitivity derivatives, though a powerful tool for 
both gradient-based minimization and post-optimization sensitivity analyses 
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[1,39,40], may not be sufficient: more global, general models for objective- 
function approximation must be admitted. Fourth, from conditions Cl, C2, 
and C3, we deduce that neither a priori nor a posteriori regularity-based 
approximation and estimation techniques are effective: we can not hope to 
be asymptotic (Cl); we will have very little insight into the proper norm, 
form, or constants of approximation errors (C2); a posteriori error analysis 
based on local subproblems or extrapolation will not be computationally vi- 
able (C3). These considerations suggest that new error norms are required. 
Fifth, from condition C3, we can assume at least partial decoupling of the 
parent optimization problem and the expensive subproblem; although results 
of the former will certainly affect the region in which we choose to examine the 
latter, the two tasks remain computationally relatively independent. Armed 
with this methodological outline, we now proceed to discuss the particular 
algorithms developed. 

Remark on Modelling. We assume here that our mathematical model, 
Ad°, accurately reflects the physical problem of interest, denoted Ad 00 . Our 
computational surrogate approach implicitly considers the modelling pro- 
cess in two stages: Ad 00 —* A d° — ♦ Ad. An alternative, one-stage, ap- 
proach, Ad 00 — + Ad, proceeds directly from the physical problem, Ad 00 , to 
a computationally simple engineering model, Ad, without passing through a 
large-scale-simulation intermediary, Ad° (or physical experiment). We be- 
lieve the two-stage approach is preferrable, as the more difficult problem of 
physical-to-mathematical translation is conducted at a level which accomo- 
dates greater complexity. This greater flexibility should not, of course, serve 
as an excuse for less discriminating modelling practices. □ 

3 Modelling Methodology 

In this section we consider both validation procedures, in which we assess a 
given »£(£), and construction-validation procedures, in which we both pro- 
pose and assess £(/?). As much of [2] is focussed on the motivation, analysis, 
and empirical verification of the validation and construction-validation al- 
gorithms, we confine ourselves here to a brief summary of the major points. 
We restrict ourselves initially to a single output, K — 1; extension of the 
theory to multiple outputs is described in Section 5. 
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3.1 Validation 

3.1.1 Algorithm 

The validation algorithm takes as given: (i) A subproblem, with an input 
M-vector, p, an input domain, ft G R M , a (single) output, s G R, and an 
input-output function, $( £ ) G L°°(ft). We prefer, but do not require, that 
the subproblem satisfy the three complexity conditions Cl, C2, and C3, of 
Section 2.2. (ii) A proposed surrogate, $( £ ) : ft — ♦ R. (iii) A strictly positive 
Bayesian importance function, 

p( E ) : ft R+' / n P(£) d E =1 > (20) 

which describes the a priori relative importance ascribed to different points 
within the input domain ft. As will be discussed in Section 4, this impor- 
tance function is best interpreted as a prior “density” for £ ”(A). We note that, 
for the error statements developed in Section 3.1.2, the importance function 
is required to ensure input-transformation objectivity, (iv) The maximum 
number of 5(g) evaluations permitted, N ev . This parameter describes the 
resource limitation associated with the validation exercise, (v) Two valida- 
tion error tolerances, ei,£j G [0, l] 2 , the significance of which will become 
clear in the validation error statement of Section 3.1.2. 

We now summarize the simple Monte-Carlo [14] Model Validation (MV) 
Algorithm of [2]. We note that this algorithm is, in effect, nothing more than 
randomization of obvious parameter-space exploration procedures; however, 
the introduction of a probabilistic framework permits an error statement 
which marries well with subsequent optimization analyses. 

Algorithm MV(5( £ ), ( £ ), />( £ ), N ev , £i, £ 2 ) 


1. SET N va = Af{ei,£ 2 ), 

Kf! \ I** ^2 

V(£!,£ 2 ) - b(1 _ £l ) ' 

2. IF N va > N ev , validation is not possible. 

3. FOR; = 1 ,...,N va 

51. DRAW £, ~p( £ ) 

52. COMPUTE Sj = S{Rj). 


( 21 ) 
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4. SET £ max = max je {i at-} Ej, where Ej = |5j -5(£j)|. □ 


Here 2C ~ /( x) refers to a random vector X. with probability density function 
/(x); we shall indicate a particular realization of a random quantity Y (vari- 
able, vector, or domain) as In the MV Algorithm, the P_ } are random 
vectors, and the Sj,Ej , and E max are random variables. 


3.1.2 Error Analysis 

The output of the MV Algorithm, the model prediction error estimator, 
Em&x i is related to the model validation error estimate, g 1? and the validation 
statement uncertainty, g 2 , by a precise probabilistic statement, 


Pr 


u 


{ P €n||5(p)-g(p)l<E m „} 


p{p)dp > 1 -£i} > 1 -g 2 , 


( 22 ) 


where Pr {event} is the probability that event occurs. In words, (22) states 
that, with probability greater than or equal to 1 — g 2 , |5(p) — 5(p) \ < E max 
over a region of Cl of relative weighted volume greater than or equal to 1 — e x ; 
equivalently, with probability greater than or equal to 1 — g 2 , |5(g) — 5(g) | > 
Em*x over a region of Cl of relative weighted volume no greater than gj. 
(The probability ensemble here is defined with respect to repetition of the 
algorithm: we expect that, in greater than 1 — g 2 of all realizations of the 
algorithm, |5(g)— 5(g)| < E m&x over a region of Cl of weighted relative volume 
greater than 1 - gj.) 

The critical aspects of the validation procedure are: first, a precise (al- 
beit probabilistic) error statement, (22), can be made for a fixed number 
of evaluations of 5(g); second, the sample-size requirement, (21), and re- 
sulting error estimate, (22), are nonparametric, valid for any functions 5(g) 
and 5(g); third, the validation statement, (22), requires no assumptions on 
5(g) or 5(g) as to regularity or functional form. Perhaps most importantly, 
the error estimate will also prove amenable to a posteriori analysis in the 
optimization context (see Section 4). The error statement (22) is readily 
interpreted in the probably approximately correct framework developed for 
classification problems in the theory of learning [12,13]; in the probably- 
approximately-correct context, finite uncertainty — in our case represented 
by £x and g 2 — permits a precise statement for a fixed sample size. 
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The derivation of (22) is given in [2] in terms of order-statistic tolerance 
limits [16,41)42]. We indicate here an alternative derivation, based on bino- 
mial considerations, that has the advantage of direct (multinomial) extension 
to the multiple-output case (see Section 5). We define F E to be the (perforce 
increasing, though not necessarily strictly increasing, nor continuous) cumu- 
lative distribution function of the random variable E = \S(P.)-S(F)\, where 
£ is a random vector with probability density p(£). The 1 — quantile of 
E , ei_ ei , is then defined by F E {e x . tl ) = \ - e x (more precisely, ex_ ei is the 
minimum z such that F E (z) > 1 — £i). Lastly, j max is any j € {1, . . . , N va ) 
for which E, = E max . Then, if 71_ e , = {£ € ft | \S(jo) — «S(£)| < e i-*i }> 

Pr{Vj € {1 JV~}, E, € T,-„ ) < (1 - £i) iv " • (23) 

It follows that at least one will lie in ft \ 7i_«, = {£ € ft H^Ce) — *^(e)I — 
e x _ fl } with probability greater than or equal to 1 — (1 — ei)^ . Furthermore, 
if at least one £, lies in Q \ 7]_ e , , then, since E max >Ej,Vj 6 {1 

in particular will lie in and thus F^E^x) > 1 — £i- Finally, 

recognizing F E (e) = / {p6 n | |5( E )-5( E )|< e } p(e) d R' and substitutin 6 from ( 21 ) 
(l-gj)^’ = e 2 , we obtain (22). This binomial derivation of (22) is essentially 
a classification argument; not surprisingly, the sample-size requirement (21) 
also appears in [13]. 

The origin of uncertainty in (22) is a random region 

14 = {p £ 0 | |5(g) ” <5(2)1 ^ ^max} ^ 

1 - e 2 probably of relative weighted volume less than or equal to £ x , oi unde- 
termined location and shape, over which the surrogate misfit, |<S(£) — 5(g) |, 
is unknown. The usual confidence interval-confidence level balance inherent 
in (21) has an interesting interpretation: if we consider — In £2 to be how well 
we know the simulation behavior, and —1/ ln(l — £ 1 ) to be how much of the 
simulation behavior we know, then, for a fixed number of appeals to 5(g), 
A rva , (21) implies that the product of how well and how much is fixed (in fact, 
equal to N va ). If we choose the deterministic limit, in which we tolerate no 
uncertainty (£2 — ► 0), then how well we know the simulation behavior tends 
to infinity, but how much of the simulation behavior we know tends to zero: 
we know the simulation behavior only at the points sampled, which is a set 
of measure zero. By permitting finite but controlled uncertainty in how well , 


17 


surrogates accomodate a finite how much: surrogates - probable but global 
— bridge the gap between direct computation, which is sure but pointwise, 
and analytical methods, which are sure and global. 

The balance between E\ and £2 is not “symmetric,” however. In particu- 
lar, £1 ~ — In £ 2 / N va as N va — * 00 for £ 2 fixed, corresponding to rather slow 
algebraic decay. For example, for N va = 22, we can choose £1 = . 1 , £2 = .1; 
doubling the number of evaluations to N va = 44 reduces £1 by only a factor 
of two, £1 — .05, £2 = .1. In contrast, £ 2 ~ e -«iW va as N va — * 00 for £j 
fixed, corresponding to rapid exponential decay. For example, again begin- 
ning with N va = 22 and £1 = . 1 , £2 = .1, doubling the number of evaluations 
to N va = 44 permits a tenfold decrease in £ 2 , £1 = -l , £2 = -01. It follows 
that, with only a modest number of evaluations of S(p), £2 will be sufficiently 
small that we can assume with near certainty that U is, indeed, of relative 
weighted volume less than or equal to e x ; the remaining uncertainties are the 
location of U , and the surrogate misfit, |5(g) — «S(p)|, over U. 

Remark on Dimensionality. Our algorithm, sampling requirement (21), 
and error estimate (22) apply independent of input-vector dimensionality, 
M. However, this generality is deceptive; as M increases, although the rel- 
ative volume of U remains invariant, U will reflect increasingly significant 
excursions in individual components of the input vector (e.g., compare the 
side length of a square and cube of the same volume). This implies that 
our technique will not be viable for too many subproblem inputs (although 
the number of design variables may be large). Surrogate techniques should, 
however, be extensible to problems of shape optimization [40], which are 
essentially infinite-dimensional, if geometrically motivated correlations be- 
tween the inputs are introduced in order to reduce — through p(g) — the 
effective volume of the input domain. □ 

3.1.3 Eddy-Promoter Example 

We apply the validation procedure to the eddy-promoter Nusselt number, 
Q(a, R). In order to evoke the MV Algorithm, the prerequisites listed in Sec- 
tion 3.1.1 must be supplied. The necessary quantities are, in fact, all defined 
in Section 2.3, save the proposed surrogate, Q(a,R), the importance func- 
tion, p(p), the maximum number of evaluations, N ev , and the uncertainty 
tolerances, £ x and £ 2 . For our purposes here we simply select for the surro- 
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gate Q(a, R) = 1, which corresponds to the Nusselt number for conduction in 
the channel in the absence of the eddy-promoters and any flow. (This simple 
surrogate is chosen for lack of a better heuristic; however, this example also 
illustrates application of surrogate procedures to test global stability, or 
sensitivity, of a solution — in this case the conduction solution — to vari- 
ations in the design variables.) We take the importance function, p(g), to 
be uniform over the triangular input domain, reflecting no prior knowledge 
as to which parts of the domain will prove more interesting in the ultimate 
optimization application. Lastly, we set N ev = 44, e x = .l,e 2 = - 01 ( b Y 
construction, N va = Af(e i = .1,£ 2 = -Of) = 44 = N ev ). 

Implementation of the MV Algorithm is now straightforward. First, we 
employ a standard acceptance-rejection Monte-Carlo method [14] and a con- 
gruential pseudorandom number generator [43] to produce N va = 44 input 
points which are randomly and uniformly distributed over the input domain 
fi EP ; the input points, {a,, Rj},j = 1, . . . , N va , resulting from one realization 
of this process are shown in Figure 4. Next, the Nusselt number is computed 
at each of the input points, q 3 = Q(aj, R,),j = 1, . . . , N va , following the field 
subproblem evaluation procedure described in Section 2.3. Lastly, we com- 
pute = maXj 6 {i jv««} | q, — (2(a>, -ft;) I = max >e{i n v “} kj — Mi f° r the 

realization shown in Figure 4, we find = .236. We thus conclude that, 

with confidence level greater than .99, the discrepancy between Q(a, R) (the 
Navier-Stokes solution) and Q = 1 (our simple surrogate) is less than .236 
over more than 90% of fl EP . The error in the surrogate is, expectedly, rather 
large, confirming that the flow departs significantly from conduction within 
the design space fl EP . To capture this departure in greater detail, we need 
to consider construction-validation procedures. 

3.2 Construction— Validation 

3.2.1 Algorithm 

The construction-validation algorithm takes as given: (i) A subproblem, 
with an input M-vector, £, an input domain, fi € St M , a (single) output, 
s € 1R, and an input-output function, «S(£) € T°°(n). (ii) A modelling 
(approximation) procedure, 

A : (fl x R) v - L°°(ft) , (24) 
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which, given V input-output pairs, Ej = {(?,, $(&)), . . . , (p p , S( £p ))}, gen- 
erates the surrogate rule, S(p). (iii) A Bayesian importance function, p(p), 
satisfying (20). (iv) The maximum number of <S(p) evaluations permitted, 
iV ev . (v) Two validation error tolerances, e u e 2 6 [0, l] 2 . (vi) The Model 
Validation Algorithm of Section 3.1.1. 

We now summarize a simple Monte-Carlo Model Construction- Validation 
(MCV) Algorithm based on random datasets [2]. 

Algorithm MC V(5(p), p(p), N' v , e u e 2 ) 

1. COMPUTE N va = Af(s i,£ 2 ) from (21). 

2. IF N va > N ev , QUIT; ELSE SET jV co ( ni ‘ ruce,on ) = N ev — N va . 

3. FOR j = 1, . . . , N co (random dataset): 

51. DRAW ~ p(p) 

52. COMPUTE S } = S(P } ). 

4. SET S( E ) = ^({(Zu^),...,^^^)}). 

5. CALL MV(5(p),5(p),p(£),^°,£ 1 ,£ 2 ) — E max . □ 

(For simplicity of presentation we indicate that the first N c0 input points 
serve for construction and the last N va input points serve for validation; in 
practice, a sample of N' v input points is drawn and then randomly parti- 
tioned into construction and validation subsets.) The constructed model and 
model prediction error estimator, £ max , satisfy our probabilistic validation 
statement, (22). Although the S(p) are, in fact, random, for the purposes of 
this paper we shall condition all results on a given model S(p). 

The MCV Algorithm presented is rather crude and inefficient. First, we 
would prefer to compare and select amongst different surrogates, choosing 
that model which incurs the smallest model prediction error estimate or 
which is computationally least expensive [5,6]. Second, we would like to 
adapt to information generated during the construction-validation process; 
a sequential approach offers clear advantages, permitting the algorithm — 
and the appeals to the expensive S(]>) — to terminate when the (or a) model 
prediction error estimate is sufficiently small. Both of these improvements 
are made possible by the multiple— output extension described in Section 5. 
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3.2.2 Classes of Models 

Models can be characterized in several ways: by dataset E v , determinis- 
tic or random; or by procedure, A , “graybox” or “blackbox. Although the 
modelling problem would appear to be a routine exercise, several factors com- 
plicate the process. First, the domain fi will often be irregular, precluding 
simple tensor-product techniques. Second, the input vector and domain. fi 
may be of high dimension, M: most complex-geometry interpolation proce- 
dures developed for partial-differential-equation applications in two or three 
space dimensions are increasingly cumbersome or computationally intensive 
with increasing space dimension; local, linearized models commonly used for 
multivariate response surfaces are inappropriate for our (global) purposes. 
Third, random datasets offer certain advantages within the surrogate con- 
text: scattered-data approximation procedures [17] are considerably more 
problematic than ordered datasets. 

We begin by comparing the relative advantages (marked with a 4- ) and 
disadvantages (marked with a — ) of deterministic and random datasets. 
First, deterministic datasets: (+) ensure the anticipated distribution is real- 
ized; (+) can exploit existing datasets; (+) permit a range of well-developed 
approximation procedures (tensor-product, “finite-element”); (+) permit a 
priori regularity-based approximation-error estimates; (— ) extend with some 
difficulty to complex fi, in particular for larger A/; (— ) preclude recycling of 
datapoints for (perforce random) validation (see [2] and Section 5). Random 
datasets: (-) exhibit fickle “distribution”; (-) disqualify existing (determin- 
istic) data; (-) permit only scattered-data approximation methods, such 
as Voronoi methods [2,44], modified Shepard techniques [17,45], and radial- 
basis-function approaches [11]; (— ) permit only limited a priori regularity- 
based approximation error estimates; (+) extend readily to complex fi; (+) 
extend readily to larger Af; (+) permit recycling for validation. 

We also briefly compare graybox and blackbox modelling approaches. 
Graybox models are intended to reflect prior information as to the antici- 
pated form of the phenomenon under consideration. The resulting model 
is “parametric,” involving a finite number of to-be-determined basis co- 
efficients representing a fixed-dimensional approximation space. Blackbox 
models are intended to be largely unbiased as to possible functional form, 
though clearly some minimal regularity assumptions are required. Black- 
box models are preferrably “nonparametric,” permitting the approximation 
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of arbitrary functions to arbitrary accuracy by consideration of a family of 
approximation spaces of increasingly large dimension. Graybox and black- 
box approaches can be gainfully combined as blackbox-corrected graybox 
models. 

In [2] we develop a random-dataset Voronoi-based piecewise-constant 
blackbox approximation procedure for complex domains which extends di- 
rectly and efficiently (linearly in complexity) with increasing input dimen- 
sion, M; the technique is applied to a problem of stress concentration in 
linear elasticity. Unfortunately, although the Voronoi method, which is ef- 
fectively a piecewise constant finite element approximation over convex tiles, 
does enjoy certain approximation properties, the method is too low-order 
to make effective use of the perforce limited datasets available for surrogate 
construction. In the current paper we choose for our construction procedure 
the two-dimensional implementation of the scattered-data (random dataset) 
modified Shepard method, A >-*• QSHEP2D (ACM Algorithm 660, [45]). At 
present both the Voronoi and Shepard methods are ‘‘Lagrangian,” based only 
on function values; as numerical and automatic differentiation techniques 
[46,47] become better developer!, “Hermitian” approximations incorporating 
sensitivity derivatives may prove more efficient. General multivariate (large- 
M) approximation theory remains an open research area. 

3.2.3 Eddy-Promoter Example 

We now apply the construction-validation procedure to the eddy-promoter 
flowrate and Nusselt number, Q(a,R ) and Q(a,R), respectively. In order to 
evoke the MOV Algorithm, the prerequisites listed in Section 3.1.1 must be 
supplied. The necessary quantities are all defined in Section 2.3, save the 
approximation procedure A, the importance function, />(£), the maximum 
number of evaluations, N ev , and the uncertainty tolerances, e\ and 1 2 . As 
described in Section 3.2.2, we use the Renka [45] implementation of the modi- 
fied Shepard method as our construction procedure. We take the importance 
function, p(p), to be uniform over the triangular input domain, reflecting 
no prior prejudice as to areas of potentially higher interest. Lastly, we set 
N ev = 44, £1 = .1, and e 2 = .1 ( N va = Af(e x = .l,e 2 = .1) = 22, and thus 
N eo = N va = 22). 

Implementation of the MCV Algorithm is now straightforward. First, as 
in the MV Algorithm, we employ a standard acceptance-rejection Monte- 
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Carlo method and a congruential pseudorandom number generator to pro- 
duce A reu = 44 points which are randomly and uniformly distributed over the 
input domain fl Ep . Then, the flowrate and Nusselt number are computed at 
each of these input points, {gj,qj} = {G( a j, fy)i Q( a ji j = l,-. -,N ev , 
following the field subproblem evaluation procedure described in Section 2.3. 
Next, these N ev = 44 input-output pairs are randomly partitioned into two 
subsets, a construction set of size N co — 22 for Step 4 of the MCV Al- 
gorithm, and a validation set of size N va = 22 for Step 5 of the MCV 
Algorithm. Finally, the surrogates are formed and tested. Our construc- 
tion method is slightly modified to include prior information: QSHEP2D is 
evoked in Step 4 not with N co points, but with N co + 2 points. The two ad- 
ditional contributions comprise a “‘plane-Poiseuille— flow” input-output pair, 
{{a+R} = {0,0}, g = 2/3), ({a, i?} = {0,0}, q = 1), and a prior-work eddy- 
promoter Tollmien-Schlichting input-output pair, ({a, R} = {-5, .2 }, g = 
.311), ({a,/?} = {.5, .2}, <7 = 1.12). (Although the Nusselt number for the 
extensively studied {a, R } = {.5, .2} geometry [35-37] can be estimated from 
published data [35], we prefer to exactly recompute g and q for the precise 
boundary conditions of the current paper.) 

For the particular (single) train-test realization of Figure 5, we obtain 
the flowrate and Nusselt number surrogates shown in Figure 6, and model 
prediction error estimates for the flowrate and Nusselt number of 3? E^^ = 
.035 and = .092, respectively. We thus conclude that, with confidence 

level greater than .90, the discrepancy between Q(a,R) (the Navier-Stokes 
solution) and Q (our surrogate) is less than .035 over more than 90% of fi ; 
similarly (but not jointly, see below), with confidence level greater than .90, 
the discrepancy between Q(a, R) (the Navier-Stokes solution) and Q (our 
surrogate) is less than .092 over more than 90% of fl EP . Discussing first the 
flowrate, we see from Figure 6 that, not surprisingly, for our fixed pressure 
gradient, the flowrate decreases for cylinders either farther away from the 
wall or of larger radius: both of these variations increase the drag on the 
eddy— promoter. The flowrate surrogate is rather accurate, with a model 
prediction error estimate of only .035 over an observed flowrate range of 0 < 
g < .667. Turning now to the Nusselt number, we see that the largest Nusselt 
number obtains for the larger-cylinder steady wavy mechanism, but that 
significant transport also occurs for the unsteady Tollmien-Schlichting mode. 
The Nusselt number surrogate is less accurate than the flowrate surrogate, 
with a model prediction error estimate of .092 over an observed Nusselt 
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number range of .764 < q < 1.186. 

The reader will notice that the input points are the same for the validation 
example of Section 3.1.3 (see Figure 4) and construction-validation example 
(see Figure 5) of the current section; furthermore, the flowrate and Nusselt 
number are both validated on the same set of input points. In essence, we are 
exploiting the databoard concept (see [2] and Section 5), in which a single 
set of input points is recycled for different models, outputs, or optimization 
studies. Note however, that, in the current single-output context (I< = 1), 
each example must be treated as a separate problem — the sample-size 
requirement (21) and associated error estimates (22) are not joint. In Section 
5 we describe the simple modification which permits us to state joint error 
estimates for multiple outputs validated over a common input set. 

4 Optimization Purposiveness 

4.1 Surrogate-Based Optimization 

We recall from Section 2.4 that the essential aspect of surrogate optimization 
is the replacement of the actual objective function, $(p,A) "= <£(<£(£); d; A), 

with a surrogate objective function, $(p,A) = A). Within this 

broad framework, however, several different approaches are possible. First, 
one can proceed in an “unvalidated mode,” in which one tests surrogate pre- 
dictions only at surrogate-proposed design points. This approach has the 
advantage that all points are dedicated to construction, but the disadvan- 
tages that: first, even if the surrogate is accurate at the proposed design 
point, one has no assurances as to the accuracy of the surrogate at other 
input points upon which selection of the design point may be conditioned 
(e.g., through gradient information or simple rejection); second, if the surro- 
gate is not sufficiently accurate at the proposed design point, an appropriate 
course of action is not clear. second approach, “learn-by-doing” (e.g., 
[48]), performs construction-validation during the optimization process; that 
is, the surrogate model input sample reflects the local structure of the ob- 
jective function. The Iearn-by-doing approach has the advantage that the 
importance function is, perforce, relevant to the local search process, but the 
disadvantage that the surrogate developed for one objective function may 
be inappropriate for a subsequent optimization study in which the objective 
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function has been modified. 

We pursue here a third approach, a “Bayesian validated” approach, in 
which the validation importance function, p{p), serves to indicate the antici- 
pated relative relevance of points within the feasible design space fi; ideally, 
in a parent optimization project involving numerous optimization studies 
parametrized by A, £*(A) would be “distributed” according to />(£*). It is 
critical to note that we rely on this definition to motivate, but not to jus- 
tify, the choice of p(p); techniques for determining the de facto influence of 
p(p) on any single optimization study (that is, particular A) are presented 
in Section 4.3. The Bayesian validated approach permits better a posteri- 
ori error estimates than the “unvalidated mode,” though at the expense of 
fewer points for construction. (In fact, validation points can, subsequent to 
validation in Step 5 of the MCV Algorithm, be included in a revised con- 
struction, however the resulting model no longer satisfies a rigorous error 
statement.) The Bayesian validated approach ensures greater flexibility than 
the “learn-by-doing” approach, though at the cost of lower relevance for any 
particular study. In summary, the Bayesian validated approach is probably 
better suited for initial, global studies than for final, local designs. 

4.2 Monte Carlo Algorithm 

Our surrogate-based optimization algorithm takes as given: (i) A subprob- 
lem, with an input A/-vector, p, an input domain, fl € St M , a (now possibly 
multiple) output, s 6 iR ft , and an input-output function, &>(£) 6 
(ii) A modelling (approximation) procedure, ^(E £), as described in Section 
3.2. (iii) An optimization evaluation procedure, <^(s;£; A.) : x x A — * 

St. (iv) A Bayesian importance function, p(£), satisfying (20). (v) The max- 
imum number of £(£) evaluations permitted, N ev . (vi) Two validation error 
tolerances, £ t ,£ 2 € [0, l] 2 . 

We now summarize the Surrogate-Based Optimization (SBO) Algorithm. 
Algorithm SBO(^(s; £; A), p(p), N ev , £i, £ 2 ) 

1. COMPUTE N va = V(£ ,,£ 2 ) from (21). 

2. IF N va > N ev QUIT; ELSE SET N co = N ev - N va . 

3. FOR j = 1, . . . , N ev \ 

SI. DRAW tj ~ Pis) 
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S2. COMPUTE Sj = £(£)• 

4. SET S( R ) = 4({(£,,&), (P N co,£ N co)}). 

5. FOR A = A\A 2 ,A 3 .- ■ . 

{ 6. FIND $min(A) and p*( A), where 

$mm(A) = <A(i£(£*(A));£*(A);A) < <£(£(£);£; A) Vg € 0 . (25) 

7. SET E^ x = maxj € {Afco +1> , jv«} where 

E* = l«H£,;£,;A) -«£(£,);£, ;A)I . 

8. COMPUTE a posteriori estimates (see Section 4.3). 

9. CONSIDER adaptive refinement (see Section 4.4). } □ 

(For simplicity of presentation we indicate that the first N co input points 
serve for construction and the last N va input points serve for validation; in 
practice, the sample of N ev input points is randomly partitioned into con- 
struction and validation subsets.) The $min(A) an d £*(A) of (25) will be de- 
noted the “surrogate minimum” (more precisely, the global minimum of the 
surrogate objective function) and the (or a) “surrogate minimizer,” respec- 
tively; $min(A) and £*(A) of (2) will be referred to as the “actual minimum” 
(more properly, the minimum of the actual objective function, $(£,A)) and 
“actual minimizer,” respectively L 

We make several comments concerning the SBO Algorithm. First, the 
A m of Step 5 are selected based on currently available information, including, 
perhaps, the results of previous optimization studies (that is, for A",** < m). 
Second, we remark that the surrogate objective function is constructed only 
indirectly; that is, rather than directly construct the objective function out- 
put from A applied to (£, <£(«£(£);£; A)) pairs, we first construct surrogates, 
£(p), for the “intermediate physical outputs,” £(p), in Step 4, and then 
simply evaluate <^>(£(£);£;A) in Step 6 and Step 7. For example, for the 
eddy-promoter heat exchanger, we construct (7) not directly from the sam- 
pled data, but indirectly from the intermediate surrogates for flowrate and 
Nusselt number described in Section 3.2.3. We prefer this two-stage ap- 
proach to direct construction of the objective function because: each new A m 
does not require re-appeal to a construction procedure; we are more likely 
to have prior information for the physical quantities than for an artificially 
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synthesized objective function. Third, we note that, despite the indirect 
construction-cum-evaluation of the surrogate objective function, we directly 
validate the surrogate objective function in Step 7, that is, we directly com- 
pute the errors in the objective function rather than (less precisely) infer 
these errors from the errors in £(£). From Step 7 and Section 3.1.2 we know 


Pr 


</ 


{pen | |*(£)-*(p)I<££«} 


p(p)d£ > 1 - £i } > 1 - £2 1 


(26) 


or, equivalently, 

Pr{jf # />(£)d£ < £ 1 } > 1 - £2 , ( 27 ) 

where U* = {p € fi 1 |$(£; A) - $(p;A)l > E^ x }. These (effectively single- 
output) objective function prediction error estimates are required for the a 
posteriori analysis described in Section 4.3. 

We close this section by remarking that our algorithm is related to, but 
significantly different from, several other stochastic optimization procedures 
[14]. First, as compared to the simplest random search procedure, in which 
the minimum is approximated as the minimum of a random sample, our 
approach offers two advantages: by constructing and subsequently minimiz- 
ing a surrogate, we can exploit whatever prior information may be available 
for, and whatever continuity may be present in, the objective function; the 
surrogate reveals internal error and sensitivity estimates not apparent from 
the bare “nodal values” used in the random search procedure. Second, as 
compared to multistart techniques, in which a random sample serves as the 
starting point for many parallel local (e.g., gradient-based) minimization 
problems: the multistart technique shares the disadvantages (but also ad- 
vantages) of direct insertion as regards each local search; the probabilistic 
estimates for the multistart technique, although ostensibly similar to our 
validation statements, are in fact expressed in terms of the (unknown) size 
of the basin of attraction associated with the global minimum. 


4.3 A Posteriori Estimates 

We discuss in this subsection what can be said of a definitive (or almost 
definitive) nature following surrogate-based optimization. Our goal is to un- 
derstand, for any given single realization, the influence of the selected impor- 
tance function, p(p), on the reliability of the surrogate-based optimization 
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process; our notion of success is thus very different than that adopted in 
[12], in which the quality of the surrogate is quantified only for the case of 
repeated trials according to the initially prescribed importance function. The 
a posteriori estimates of Step 8 serve in Step 9 first, to ascertain if the sur- 
rogate minimum is acceptable, and second, if the surrogate minimum is not 
acceptable, to guide subsequent adaptive improvement efforts. It is critical 
to note that, consistent with the notion of an expensive subproblem, all a 
posteriori estimates in Step 8 require no new appeals to £(p). 

4.3.1 General Case 

We state our result, present a brief formal derivation, and discuss the theo- 
retical and practical implications. To begin, for any r 6 (1, 1/ei), we define 
Xr to be the set of all closed domains, 71, in 0 for which f-% p(p)d^ = re j. 
We then set 

6 = min max{$(p) - , (28) 

R€xr p€K — 

K s = arg mnjmax{$(£) - $„„„}] . (29) 

In essence, the sensitivity region TZs is that (or a) region of relative weighted 
volume re x for which the deviation of the surrogate objective function from 
4>nun is minimal; this minimal deviation, 6, reflects the sensitivity of the sur- 
rogate objective function to variations in the design variables in the vicinity 
of the surrogate minimizer. We can now express a form of “lower semi- 
continuity:” with probability greater than 1 — 

3p 6 7Z s ,p 0 , such that $(p 0 ,A) < $mi„ + & , (30) 

where to, the “predictability gap,” is the random variable 

w^E^ + 6. (31) 

Note that (30) says nothing concerning the discrepancy between the surro- 
gate minimum and actual minimum or the surrogate minimizer and actual 
minimizer; without further hypotheses on $(p; A) and $(p; A)i no such state- 
ment can be made. Condition (30) does, however, say something concerning 
the reliability of the surrogate prediction: within a constructable region, TZs, 
there exists a point (in fact, many points) at which actual system performance 
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is within ts of the surrogate-predicted optimum. Our underlying strategy 
is to strive for a global minimum of $(p; A), but to require reliability in the 
surrogate prediction; our approach is thus, at least philosophically, related 
to the Taguchi approach to quality control [18,19]. 

To derive (30), we first note that, from (27), with probability greater than 
or equal to 1 — £ 2 , U* is of relative weighted volume less than or equal to £j. 
If U * is of relative weighted volume less than or equal to £ 1 , then TZs \ U® 
must be nonempty, since Its is of relative weighted volume strictly greater 
than £1 (recall r > 1). Then, for any point £' in TZs \ U*, 

|$( £ '; A) -$(£-; A)| = |^(£';A)-$( £ ';A) + $( E ';A)-$(£*;A)| 

< I*(e';A) - *(£'; A)| + !*(£'; A) - *(£•; A) I • 

As |$( £ ';A)-$(e';A)| < (recall £ ' and |%; A) - $( £ *; A)| < 6 , 

(30-31) directly follows, with £o = £ '. From this derivation it is clear that 
(30) applies to each A m of Step 5 separately, not jointly; however, the SBO 
Algorithm is readily extended such that (30) is jointly valid (see Section 5). 

In order to illustrate (30), we consider a simple model problem with 
M = 2, £ € fi = [—1,1] x [—1,1], an d p( £ ) = 1/4. We presume that the 
result of the SBO Algorithm is a surrogate objective function, 

$( £ ) = p\ + u 2 pl + 1 (w > 1) , (32) 

with minimum and minimizer = 1 and p’ = {0,0}, respectively, and pre- 
diction error estimate 1StE^ iAX . It is then readily computed that S = 4r£ 1 u^/ff, 
with TZs given by the elliptical region centered at the origin with major (pi) 
and minor (p 2 ) axes of \/S and s/SJu, respectively. The predictability gap is 
thus + 4r£iw/ff . It is clear, since we have not even defined the 

actual objective function, $( £ ;A)> that this analysis is based entirely upon 
appeals to the surrogate. This model problem is not, of course, completely 
arbitrary; the M-dimensional generalization of the objective function (32) is 
a local representation of any sufficiently differentiable objective function near 
an interior minimizer. 

Turning now to the implications of (30), we remark, first, that (30) quan- 
tifies the effect of a poorly selected p(p): we expect that 6, \R.$\ and w will 
be inversely proportional to p(p*(A))- Second, we understand the origin of 
the predictability gap, w, as distinguishable construction, an< ^ va ^‘ 

dation, 5, contributions. We expect that, as N co and N va — * 00 (£ 2 fixed), 
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Em* x ,6 and hence zu will tend to zero; furthermore, for any particular study 
at finite N ev , the ratio £ = %E£ uxx /6 provides valuable “construction versus 
validation” guidance for adaptive improvement (see Section 4.4). Third, the 
continuity statement (30) includes a notion of u £-sensitivity,” which we de- 
fine as the sensitivity of $(£; A) to variations in £. (We contrast £-sensitivity 
to “A-sensitivity,” which we define as the sensitivity of design points, such 
as $min(A) and £*(A), to variations in A.) Our u £-sensitivity” result should 
prove quite useful in preliminary optimization studies: if zu is acceptably 
small, and Us is acceptably located, the precise design point need not be 
specified, thereby maintaining maximal flexibility in the ensuing design pro- 
cess. This flexibility is particularly important when the optimization prob- 
lem, (2), reflects only one subsystem of a larger, more complex endeavor [49]. 

Remark: Random Search Revisited. It is readily shown that, with 
probability greater than 1 — ef (as £i — ► 0), a validation input point, 
resides in Its (and hence lt$ \ U *). Thus, even in the worst case, in which 
we simply set £ 0 of (30) to 3?Pj, the surrogate approach reproduces the sim- 
ple random-search result, and, additionally, provides: valuable p-sensitivity 
information through condition (30); convergence guidance through the quasi- 
convex analysis of the next section. □ 

4.3.2 Quasi-Convex Case 

We consider here the case in which ft is convex, and $(p; A) and $(£;A) 
are quasi-convex in the first argument. (A function /(£): ft — » JR is quasi- 
convex if: € ([0, l],ft,ft),/(a£ 1 +(l-a)£ 2 ) < max[/(£ x ),/(£ 2 )]; or, 

equivalently, the level sets, {£ € ft | /(£) < <p} , are convex [25].) We first state 
our main result: given a “separating value” random variable A = 2E% UiX + 8 
and an associated random “buffer zone,” 

Ba = {£G ft 1$ -£min > A} , 

a random “containing region,” AC, can be constructed in which, with prob- 
ability greater than 1 — e 2 > an actual minimizer, £*, must reside; further- 
more, as E% UiX and tend to zero, the region AC shrinks to p*, the surrogate 
minimizer. The AC-construction depends only on p(p),r, and the geomet- 
ric properties of Tt$ and Ba • In this section we: formally derive the AC- 
construction for a general one-dimensional (M = 1) optimization problem; 


30 


state the ^-construction for a particular two-dimensional (M = 2) model 
optimization problem; and discuss the implications. The derivation of the 
general M = 2 ^-construction is given in [32]; development of the general 
M > 2 ^-construction, though tractable, is rather involved and not yet 
complete. All results presented are for the case of uniform p(£). 

We now proceed with the AC-construction for the general one-dimensional 
(M = 1) quasi-convex optimization problem. We introduce the (perforce 
convex) regions fl = \Pq,Pq], ^ ]» = [Pn>^£) u and 

AC ss [P^ — £j , P£ + £i] shown in Figure 7 (note are random variables); 
for clarity of exposition, we assume that pj > P^iPs < P&, and P& > 
Pn + £i ,P£ < Pn — £\. First, from (27) we know that, with probability 
greater than or' equal to 1 — e 2 , U* is of total length less than or equal to 
£x; note that U* need not be convex. If U * is of total length less than or 
equal to e lt then: there exists a point p' in 7 Z$ \M*\ for any point p'" in 
fl \ AC, there exists a point p" in B& fl AC \ U * for which p'" < p" < p' or 
p' < p" < p'". For example, to prove the latter, take (say) p'" > P£ +£ i, and 
assume that no point p" in 5^ fl AC \ U* exists such that p' < p" < p"'\ but 
then, {P£,p'") C U* , and since p'" > Pl+e i, we arrive at a contradiction. 
We next claim that, for p' in 7 Z$ \U * and p" in fl AC \ U*, 


*(p"; A)>$min + £;L + * 

(33) 

and 


*(p";A) > *(p';A) • 

(34) 

Inequality (33) follows from 



|$(p*;A)-$(p";A)| = |l(p*;A)-^(p' , ;A) + $(p";A)-$(p";A)| 

> - «(p"; A)l - l$(p"; A) - *(/; A)l . 

and |*(f ;i) - *(p";A)l > 2 EL, + 6, |*(p"lA) - *(p";A)l < EL, (recall 

p" ^ U 9 ). Inequality (34) follows from (33) and the results of Section 4.3.1. 
Lastly, the event (say) p' < p" < p'" (p' in 1Z$ ?" in ^ H AC \7/*, p'" 

in fl \ AC) implies, from the inequality (34) and quasi-convexity, $(p"; A) < 
max[$(p'; A), ${p'"\ A)]> that $(p"; A) can not be greater than ^(p^'.A); thus, 
there exists a minimizer of <£(p; A) within AC. This proves the desired result, 
and constructs the requisite region, AC. 

We next pass to the two-dimensional case, and present the AC-construction 
for the particular model problem discussed in Section 4.3.1: M = 2; £ 6 fl = 


31 


[—1,1] x [— 1,1]; p(g) = 1/4; surrogate objective function (32); and predic- 
tion error estimate, As described in Section 4.3.1, 6 = Are^/ir, with 

Tig given by an elliptical region with (major(pi),minor(p 2 )) axes 1/w)- 

Following an analysis conceptually similar to — though technically more 
complicated than — our analysis for the one-dimensional optimization prob- 
lem, we find that, as 6 — * 0, is the exterior of an ellipse-like (though not 
exactly elliptical) region of (major(pi),minor(p 2 )) axes + 2£(1, 1/w), 

and is an ellipse-like region of (major(p 1 ),minor(p 2 )) axes 

V6{y/l + 2C + 2/r[l + yi+r^l+2C]}(l, l/w) , 

where £ = Note that, as 3?£'^ ax and £—►(), dtIC shrinks to p*. 

The implications of our results are clear. First, from the theoretical per- 
spective, we obtain convergence of the surrogate minimizer (and probably, 
with convexity, the surrogate minimum) to the actual minimizer (actual min- 
imum) as N ev — » oo. Second, from the practical perspective, we provide 
a natural framework in which to pursue search-domain reduction strategies 
[50,51]: an initial surrogate-based optimization study over 0 provides the de- 
sign space, 3?£, for subsequent adaptive improvement; with high confidence, 

9R/C contains the requisite global minimizer. This search-reduction approach 
is, in practice, hampered by the pessimistically large regions K that result 
from the rather minimal assumptions placed on $(g; A) and $(p; A). 

4.4 Adaptive Improvement 

IF (i) on the basis of the surrogate minimum $min(A)» the surrogate mini- 
mizer, g*(A), the surrogate prediction error estimate, the predictabil- 

ity gap, 5ic7, the sensitivity region, 1 Zg, and the containing region, 3i£, the 
surrogate minimum and minimizer of Step 6 are deemed unacceptable, AND 
IF (ii) further appeals to £(g) are permitted (e.g., N ev reflects only part of 
the total resource allocation for the entire project, or additional resources 
can be renegotiated given the optimization results to date), THEN a pos- 
teriori estimate-based adaptive improvement can be pursued. (Note that if 
(i) is false, then we simply declare success and return to Step 5 of the SBO 
algorithm; however, if (i) is true but (ii) is false, we must admit defeat. In 
the latter case, the a posteriori estimates serve the unpopular but valuable 
function of qualifying the surrogate minimum and minimzer.) 

1 I 
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Two adaptive improvement branches (from the local bindings associated 
with the parent SBO procedure) can be envisioned. In the first branch, 
we simply evoke one additional appeal to <£(p) to determine actual system 
performance at £ = £*, <^(£(£*);£’; A); all existing information implicates £* 
as a viable design point, and this possibility therefore merits investigation 
before proceeding further. In the second branch, we recursively evoke a 
second instantiation of the SBO Algorithm, in which we re-appeal to the 
databoard (see Section 5) or expensive simulation £(p) in order to: if £ = 
?R E^ x /6 > 1, devote additional input-output pairs to the validation of new 
models (see Section 5) or to the further construction of existing models; if 
£ = < 1, devote additional input-output pairs to the validation 

of existing models. Re-appeal to the databoard or *>(£) will typically be 
accompanied by search-domain modification, in which we, say, focus p(p) in, 
or relocate fl to, the 9R£-neighborhood of the current surrogate minimizer. 

Adaptive, or multipass, strategies, in which optimization information 
feeds back to the surrogate hypotheses, are clearly a necessity. As in ex- 
perimental data collection procedures [7], all diagnostic (here simulation) 
resources should not be expended in the first salvo; unfortunately, as for ex- 
perimental inquiries, we can hope to proffer only “rules of thumb” as to when 
to commit resources. More explicit strategies for, and examples of, multipass 
interaction [49] will be addressed in future papers. 


4.5 Eddy-Promoter Heat Exchanger 

We consider now the eddy-promoter heat exchanger example. The objec- 
tive function is given by (8); we choose p(p) to be uniform over fl EP ; we set 
N ev = 44 and £i = ti = .1. The flowrate and Nusselt number interme- 
diate physical surrogates are constructed as in Section 3.2.3 based on the 
construction sample shown in Figure 5. For the definition vector we take 
A ep = A EP ' 1 = { 0 i — 10 -7 , 02 — -20, /?3 = 1.1, k = 2.0}. Proceeding to Step 
6 of the SBO Algorithm, we find (for this two-dimensional case by a sim- 
ple search) ^^ n (A EP ’ 1 ) = .223, £ EP *(A EP ’ 1 ) = (.40, .14) for the surrogate and 
surrogate minimizer, respectively; a contour plot of the surrogate objective 
function is shown in Figure 8. From Step 7 of the SBO Algorithm we find, for 
the particular validation sample shown in Figure 5, $E% UkX = .0605. Then, 
from the a posteriori analysis of Step 8, we calculate (for r = 2) 6 = .029, and 
thus IftvD = .090 and £ = 2.14; we show in Figure 8 the region Its in which, 
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from (30), with confidence level greater than 1 — £2, we can find a £ EP >£^, P , 
such that $ ep (£^ p ,A ep ' 1 ) (actual system performance) < $^ n (A EP,1 )+.090. In 
Step 9, we re-appeal to the Navier-Stokes subproblem for p EP = £ EP '(A EP ’ 1 ) 
to obtain $ EP (g EP *(A EP ’ 1 ); A EPl1 ) = .253, quite close to the value predicted by 
the surrogate objective function. 

We now return to Step 5, and reselect A EP ' 2 = {/?i = 10 _6 ,/?2 = .01, #3 = 
1.1, k = 2.0}, because (say) the optimum predicted in Figure 8 corresponds 
to a pump size which is unexpectedly large. Exploiting the same samples 
and intermediate physical surrogates as for the first study (recall the result- 
ing estimates are not joint pending the revised multiple-output sample-size 
requirement of Section 5), we find, in Step 6 and Step 7, $m1n(A EP ’ 2 ) = 
.190, g EP *(A EP ’ 2 ) = (.80, .74), and = .0611 for the surrogate, surro- 

gate minimizer, and prediction error estimate, respectively; a contour plot 
of the surrogate objective function is shown in Figure 9. Pursuing in Step 
8 our a posteriori analysis, we calculate (for r = 2) 6 = .022, and thus 
7 = .083 and C = 2.77; we show in Figure 9 the region Its in which, from 
(30), with confidence level greater than 1 — £2, we can find a £ EP ,£^ P , such 
that $ EP (£^ P , A EP ’ 2 ) (actual system performance) < $ S, p n (A EP ’ 2 ) + .083. Note 
that, even if we falsely presume global quasi-convexity, the region 3 VC in 
which the actual minimizer must lie is, disappointingly, essentially fl EP . In 
Step 9 we re-appeal to the Navier-Stokes subproblem for g EP = £ EP *(A EP ’ 2 ) to 
obtain $(£ EP *(A EP ' 2 ); A EP ’ 2 ) = -194, again quite close to the value predicted by 
the surrogate objective function. It is not surprising that, with the increased 
(decreased) penalty on pumping power (materials cost), optimal performance 
now occurs at a lower flowrate in the vicinity of the global maximum in heat 
transfer. 

5 Extensions 


Classification Procedures. In classification problems we search for a re- 
gion of input space in which certain conditions are satisfied; for example, 
in the eddy promoter problem, we might be interested in that region of 
Q EP in which the heat transfer rate, (2(a, #), is greater than a prescribed 
threshold. In such situations, it is clearly advantageous to replace (say) the 
Navier-Stokes equations with a less expensive — surrogate — {0,1} char- 
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acteristic function. For noisy classification problems, statistical prediction 
rules [9] or neural network approaches [11] prove effective; for deterministic 
simulation-based classification problems, we develop in [2] a Boolean Voronoi 
construction method and a binomial-tail-statistic [16] validation technique. 
Modelling and classification algorithms share much in common as regards 
both motivation and formulation, and, in the future, must be combined in 
a single procedure, in which a model-based surrogate objective function is 
minimized over a classification-based feasible domain. 

The Databoard. The databoard concept, developed in [2], permits in- 
vestigations defined over different input domains to share data. The tech- 
nique, based on simple conditional samplying procedures, is best illustrated 
by an example. Consider the second eddy-promoter heat exchanger opti- 
mization problem considered in Section 4.4, in which the minimum is near 
the {a = 1.0, R — .95} vertex of fl EP . Assume, however, that the original 
design space is defined to be not the triangular fl EP , but, rather, the square 
domain fl E j p shown in Figure 10. Upon minimization, the minimizer will 
no doubt reside on the boundary of fl EP , perhaps prompting the investigator 
to expand the design space to (say) the full triangle, fl EP . In performing 
the subsequent adaptive improvement over Q EP , the researcher can: recycle 
existing data from fi EP ; evoke new simulations only when fl EP is depleted, 
or when the requested input point lies in fl EP \ fl EP ; post new simulations to 
the databoard for the benefit of future investigations. The concept is readily 
expanded to permit rather general input domains and both modelling and 
classification studies; furthermore, if raw, rather than processed, simulation 
data is posted to the databoard, significant output flexibility can be achieved. 

Multiple— Output Validation. We discuss here the generalization of the 
Model Validation Algorithm (and, by obvious extension, the MCV and SBO 
Algorithms) to the case of multiple outputs, K > 1. In particular, we con- 
sider the situation in which we wish to validate K > 1 outputs (e.g., for the 
eddy-promoter problem, the flowrate and the Nusselt number) at the same 


sample input points, P_ j, . . . , Em**, 



E k = max E k , k — 1 , . . 


(35) 

m “ J6{1 J 

£‘ = |S‘(£ J )-5‘(£,)|, J = !,-■ 

■ ,N va , 

(36) 


35 


where S k {]3) and S k (jo) refer to the k th component of the £(£) and 2(g) 
vectors, respectively. It is readily shown [32] by multinomial extension of the 
binomial arguments described for the single-output case that, if we simply 
replace the sample-size requirement of Step 1 of the MV Algorithm with 

N va = ln(£2//l ° , (37) 

ln(l - £i) 

then the K outputs jointly satisfy a validation statement, 


Pr 






p(E) d R ^ 1 -£u k 


I A'}>!-£ 2 . (38) 


Note to arrive at the simple expression (37), certain quantities in the multi- 
nomial expansion are bounded. However, the requirement (37) is often pes- 
simistic in practice not because of the bounds, which are rather sharp for 
K < 1/cj, but because most actual output vectors are better correlated than 
the worst-case assumption inherent in (37). The remarkable conclusion from 

(37) is that only logarithmically more simulations must be performed in order 
to jointly validate multiple outputs over a common input set; indeed, if (say) 
e 2 = .01, K = 100 outputs require only twice the sample size as a single 
output ( K = 1). This logarithmic dependence further justifies the surro- 
gate concept; if, to obtain joint estimates, the sample size grew linearly with 
K, the surrogate approach would be not too different from direct insertion 
procedures as regards adaptability. 

We mention three applications of the multiple-output theory. First, mul- 
ticriteria optimization frameworks can now be addressed. Second, multiple- 
optimization studies (A 1 , A 2 .. .) can be jointly validated so that confidence 
can be assured not only in the final study, but in all earlier studies on which 
the final study is conditioned: by replacing Step 1 of the SBO Algorithm 
with (37), the A variation in Step 5 is now jointly justified. Third, if we 
interpret the K outputs as the K errors associated with a single physical 
output approximated by K different models, (37) permits efficient model- 
optimal construction procedures: we replace Step 1 of the MCV Algorithm 
with (37); we test several candidate models according to (35); we choose 
the best model based on accuracy or cost criteria [5,6]; we are assured, from 

(38) , that our rank ordering is significant, and that our validation statement 
applies to the particular model selected. Sequential procedures can also be 
pursued. 
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6 Conclusions 


Surrogate techniques constitute an attractive alternative to “direct insertion” 
for the incorporation of large-scale simulations into engineering optimiza- 
tion studies. Simulation surrogates provide direct resource control, support 
flexibility in design objectives and specifications, and gainfully accomodate 
prior information. Furthermore, the particular construction-validation pro- 
cedures proposed here enjoy a posteriori error estimates that permit both 
qualification of surrogate results and guidance for subsequent adaptive im- 
provement. Much additional work is required, however, if the simulation 
surrogate framework is to prove useful in engineering design. In particular, 
multipass adaptive refinement strategies must be articulated for both single- 
and multiple-optimization-study projects, with emphasis on construction- 
validation refinement, search domain reduction and relocation, and gradu- 
ated deployment of resources. 

7 Acknowledgements 

We would like to thank Nektonics, Inc., and Fluent, Inc., for making the 
NEKTON code available for this work; we would like to thank Mr. Mariano 
Gurfinkel of M.I.T., Drs. Einar R.0nquist and Lee-Wing Ho of Nektonics, 
Inc., and Professor Paul Fischer of Brown University for their help in inte- 
grating the parallel NEKTON code into the surrogate framework. We would 
also like to acknowledge fruitful discussions with Drs. Jaroslaw Sobieski and 
Thomas Zang of NASA Langley Research Center. This work was supported 
by DARPA Grant N00014-91-J-1889, ONR Grant N000I4-90-J-4124, ONR 
Grant N00014-89-J-1610, and NASA Contract NASl-19480. 


References 

[1] J. Sobieszczanski-Sobieski. Sensitivity of complex, internally coupled 
systems. AIAA Journal , 28(1):153— 160, 1990. 

[2] S. Yesilyurt and A. T. Patera. Statistical modelling methods for deter- 
ministic computational systems. J. Comput. Phys., 1992. Submitted. 


37 


[3] L. Ljung. System identification: Theory for the user. Prentice-Hall, New 
Jersey, 1987. 

[4] K. H. Reckhow. Validation of simulation models: Philosophy and sta- 
tistical methods of confirmation. In M. G. Singh, editor, Systems and 
Control Encyclopedia, volume 7, pages 5011-5015. Pergamon Press, New 
York, 1987. 

[5] C. Heij. Deterministic identification of dynamical systems. Number 127 
in Lecture Notes in Control and Information Sciences. Springer- Verlag, 
Berlin, 1989. 

[6] T. Bohlin. Interactive system identification: Prospects and pitfalls. 
Springer- Verlag, Berlin, 1991. 

[7] G. E. P. Box, W. G. Hunter, and J. S. Hunter. Statistics for experi- 
menters. John Wiley Sc Sons, New York, 1978. 

[8] M. Stone. Cross- validatory choice and assessment of statistical predic- 
tions. J. Roy. Stat. Soc. Ser. B , 36:111-147, 1974. 

[9] B. Efron. Estimating the error rate of a prediction rule: Improvement 
on cross-validation. J. Amer. Stat. Assoc., 78(382):316— 331 , 1983. 

[10] S. M. Weiss and C. A. Kulikowski. Computer systems that learn. Morgan 
Kaufmann, San Mateo, 1991. 

[11] J. A. Leonard, M. A. Kramer, and L. H. Ungar. A neural network 
architecture that computes its own reliability. Computers and Chemical 
Engineering, 1991. Submitted. 

[12] L. G. Valiant. A theory for the learnable. CACM , 27 ( 1 1 ) : 1 1 34— 1 1 42 , 
1984. 

[13] S. I. Gallant. A connectionist learning algorithm with provable general- 
ization and scaling bounds. Neural Networks, 3:191-201, 1990. 

[14] R. Y. Rubinstein. Simulation and the Monte-Carlo method. John Wiley 
Sc Sons, New York, 1981. 



[15] M. Adams and V. Guillemin. Measure theory and probability. Wadsworth 
& Brooks, Monterey, California, 1986. 

[16] J. W. Pratt and J. D. Gibbons. Concepts of nonparametric theory. 
Springer- Verlag, New York, 1981. 

[17] R. Franke and G. M. Nielson. Scattered data interpolation and appli- 
cations: A tutorial and study. In H. Hagen and D. Roller, editors, Geo- 
metric Modelling: Methods and Applications , pages 131-160. Springer- 
Verlag, Berlin, 1990. 

[18] G. Taguchi. Introduction to quality engineering. Asian Productivity 
Organization, Tokyo, 1986. 

[19] V. N. Nair, editor. Taguchi’s parameter design: A panel discussion. 
Technometrics , 34(2) : 127—161, 1992. 

[20] M. D. McKay, R. J. Beckman, and W. J. Conover. A comparison of 
three methods for selecting values of input variables in the analysis of 
output from a computer code. Technometrics , 21(2):239-245, 1979. 

[21] J. Sacks, W. J. Welch, T. J. Mitchell, and H. P. Wynn. Design and anal- 
ysis of computer experiments. Statistical Science , 4(4):409-435, 1989. 

[22] J. Sacks, S. B. Schiller, and W. J. Welch. Design for computer experi- 
ments. Technometrics , 31( 1 ):41— 47, 1989. 

[23] D. D. Cox and S. John. A statistical method for global optimization. 
Technical Report 67, Department of Statistics, University of Illinois, 
Champagne-Urbana, 1992. 

[24] R. W. Lewis and Y. Zheng. Coarse optimization for complex systems: 
An application of orthogonal arrays. Comp. Meth. Appl. Mech. Eng., 
94:63-92, 1992. 

[25] M. Avriel. Nonlinear proramming: Analysis and methods. Prentice-Hall, 
Englewood Cliffs, New Jersey, 1976. 

[26] P. F. Fischer and A. T. Patera. Parallel simulation of viscous incom- 
pressible flows. Annual Review of Fluid Mechanics , 26, 1994. 


39 


[27] Y. Maday, A. T. Patera, and E. M. R0nquist. An operator-integration- 
factor splitting method for time dependent problems: Application to 
incompressible fluid flow. J. Sci. Comput., 5(4):263-292, 1990. 

[28] E. R0nquist. A domain decomposition method for elliptic boundary 
value problems: Application to unsteady incompressible fluid flow. In 
D. E. Keyes, T. F. Chan, G. A. Meurant, J. S. Scroggs, and R. G. Voigt, 
editors, Proc. Int. Conf. on Domain Decomposition Methods for Partial 
Differential Equations, 5th , pages 545-557. SIAM, Philadelphia, 1992. 

[29] Y. Maday and Patera A. T. Spectral element methods for the Navier- 
Stokes equations. In A. K. Noor, editor, State-of-the-art surveys in 
computational mechanics. ASME, New York, 1989. 

[30] P. F. Fischer and A. T. Patera. Parallel spectral element solution of the 
Stokes problem. J. Comput. Phys., 92(2):380-421, 1991. 

[31] R. A. Nicolaides. Deflation of conjugate gradients with applications to 
boundary value problems. SIAM J. Num. Anal., 51(2):355— 365, 1987. 

[32] S. Yesilyurt. PhD Thesis. Department of Nuclear Engineering, M.I.T.. 
In progress. 

[33] 0. Pironneau, W. Rodi, I. L. Ryhming, A. M. Savill, and T. V. Truong, 
editors. Numerical Simulation of Unsteady Flows and Transition to Tur- 
bulence. ERCOFTAC, Cambridge University Press, Cambridge, 1992. 

[34] M. S. Isaacson and A. A. Sonin. Sherwood n umb er and friction fac- 
tor correlations for electrodialysis systems, with application to process 
optimization. I & EC Process Des. Dev., 15:313, 1976. 

[35] G. E. Karniadakis, B. B. Mikic, and A. T. Patera. Minimum-dissipation 
transport enhancement by flow destabilization: Reynolds’ analogy revis- 
ited. J. Fluid Mech., 192:365-391, 1988. 

[36] H. Kozlu, B. B. Mikic, and A. T. Patera. Minimum dissipation heat re- 
moval by scale-matched flow destabilization. Int. J. Heat Mass Transfer, 
31(10):981— 991, 1988. 



[37] M. F. Schatz, R. P. Tagg, H. L. Swinney, P. F. Fischer, and A. T. 
Patera. Supercritical transition in plane channel flow with spatially 
periodic perturbations. Phys. Rev. Lett., 66(12):1579-1582, 1991. 

[38] E. P. L. Roberts. Numerical and experimental study of transition pro- 
cesses in obstructed channel flow. J. Fluid Mech., 1993. To appear. 

[39] J. F. M. Barthelemy and J. Sobieszczanski-Sobieski. Optimum sensitiv- 
ity derivatives of objective functions in nonlinear programming. AIAA 
Journal, 21(5), 1983. 

[40] J. A. Bennet and M. E. Botkin, editors. The optimum shape: Automated 
structural design. General Motors, Plenum Press, New York, 1986. 

[41] H. A. David. Order statistics. John Wiley & Sons, New York, 1970. 

[42] A. M. Mood, F. A. Graybill, and D. C. Boes. Introduction to the theory 
of statistics. McGraw-Hill, New York, 1974. 

[43] CRAY Research, Inc., Mendota Heights, Minnesota. CF77 compiling 
systems, volume 1: Fortran reference manual, 5th edition, 1991. SR- 
3071, 5.0. 

[44] A. Okabe, B. Boots, and K. Sugihara. Spatial tessellations. John Wiley 
& Sons, New York, 1992. 

[45] R. L. Renka. Algorithm 660: QSHEP2D: Quadratic Shepard method 
for bivariate interpolation of scattered data. ACM TOMS, 14:149-150, 
1988. 

[46] P. A. Newman, G. J.-W. Hou, H. E. Jones, A. C. Taylor III, and V. M. 
Korivi. Observations on computational methodologies for use in large- 
scale, gradient- based, multidisciplinary design (AIAA Paper 92-4753). 
In Symposium on Multidisciplinary Analysis and Optimization, 1992. 

[47] C. Bischof, G. Corliss, L. Green, A. Griewank, K. Haigler, and P. New- 
man. Automatic differentiation of advanced CFD codes for multidisci- 
plinary design. Journal on Computing Systems in Engineering, 1992. 
Submitted. 


41 


[48] J. L. Rogers and W. J. LaMarsh II. Application of a neural network to 
simulate analysis in an optimization process. Artificial Intelligence in 
Design, 1992. 

[49] J. Sobieszczanski-Sobieski. Aircraft optimization by a system approach: 
Achievements and trends. Technical Memorandum 107622, NASA, 1992 

[50] M. F. Berger and H. F. Silverman. Microphone array optimization by 
stochastic region contraction. IEEE Trans. Signal Proc., 39(11):2377- 
2386, 1991. 

[51] R. Spaans and R. Luus. Importance of search-domain reduction in ran- 
dom optimization. J. Opt. Theor. Appl. , 75(3):635-638, 1992. 


42 




















*(p"';A) 


*(p"; A) 


*(p';A) 



Figure 7: Regions Q, IZs, #a, and AC associated with one-dimensional quasi-convex opti- 
mization problem. 
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Figure 8: Contour plot for $ EP (g EP ; A 121 *’ 1 )- The surrogate minimum over fi EP is $m P n (A EP ’ 1 ) = 
.223; the surrogate maximum over Q EP is .550; contours delineate level sets of relative 
weighted volume .1, .2, . . . , .9. The dashed contour encloses for r = 2. 
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Figure 9: Contour plot for $ EP (p EP ; A EP ’ 2 ). The surrogate minimum over ft EP is $m P n (A EP ’ 2 ) = 
.190; the surrogate maximum over fl EP is .385; contours delineate level sets of relative 
weighted volume .1, .2, . . . , .9. The dashed contour encloses tRIls for r = 2. 


51 





Figure 10: Hypothetical initial, Qf p , and subsequent, Q EP , design spaces for databoard 
example. Input points for the subsequent study comprise new points (A) and points recycled 
from the initial study (□). 
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